Top-k Query Evaluation with Probabilistic Guarantees

نویسندگان

  • Martin Theobald
  • Gerhard Weikum
  • Ralf Schenkel
چکیده

Martin Theobald, Gerhard Weikum, Ralf Schenkel Max-Planck Institute of Computer Science D-66123 Saarbruecken, Germany {mtb, weikum, schenkel}@mpi-sb.mpg.de Abstract Top-k queries based on ranking elements of multidimensional datasets are a fundamental building block for many kinds of information discovery. The best known general-purpose algorithm for evaluating top-k queries is Fagin’s threshold algorithm (TA). Since the user’s goal behind top-k queries is to identify one or a few relevant and novel data items, it is intriguing to use approximate variants of TA to reduce run-time costs. This paper introduces a family of approximate top-k algorithms based on probabilistic arguments. When scanning index lists of the underlying multidimensional data space in descending order of local scores, various forms of convolution and derived bounds are employed to predict when it is safe, with high probability, to drop candidate items and to prune the index scans. The precision and the efficiency of the developed methods are experimentally evaluated based on a large Web corpus and a structured data collection.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

TopX: efficient and versatile top-k query processing for text, structured, and semistructured data

TopX is a top-k retrieval engine for text and XML data. Unlike Boolean engines, it stops query processing as soon as it can safely determine the k top-ranked result objects according to a monotonous score aggregation function with respect to a multidimensional query. The main contributions of the thesis unfold into four main points, confirmed by previous publications at international conference...

متن کامل

Processing Top-N Queries in P2P-based Web Integration Systems with Probabilistic Guarantees

Efficient query processing in P2P-based Web integration systems poses a variety of challenges resulting from the strict decentralization and limited knowledge. As a special problem in this context we consider the evaluation of top-N queries on structured data. Due to the characteristics of large-scaled P2P systems it is nearly impossible to guarantee complete and exact query answers without exh...

متن کامل

Sensitivity Analysis and Explanations for Robust Query Evaluation in Probabilistic Databases

Probabilistic database systems have successfully established themselves as a tool for managing uncertain data. However, much of the research in this area has focused on efficient query evaluation and has largely ignored two key issues that commonly arise in uncertain data management: First, how to provide explanations for query results, e.g., “Why is this tuple in my result ?” or “Why does this...

متن کامل

Probabilistic Databases: Where and How

Modern enterprise applications are forced to deal with unreliable, inconsistent and imprecise information in applications like search or business-intelligence. We propose here to use a probabilistic database to model such imprecisions and support complex, top-k SQL queries with ranked answers. We model all types of imprecisions as probabilistic data and evaluate SQL using a probabilistic semant...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004